Data Mining
Why?
Data is only valuable when we can extract meaningful patterns and insights from it. Data mining is a core discipline in data science that focuses on discovering patterns, relationships, and trends within large datasets. This course equips students with essential techniques such as decision trees and association rules, which are widely used in fields like marketing, healthcare, and recommendation systems. Mastering data mining enables data scientists to make data-driven decisions and uncover hidden value in data.
What?
This course introduces the main concepts and methods of data mining, including data exploration, preprocessing, association rule mining, and classification using decision trees. You will gain hands-on experience in preparing datasets, identifying interesting patterns, and building interpretable predictive models. The course emphasizes practical application and critical thinking when analyzing data.
Curriculum:
Introduction to Data Mining
Overview of the data mining process, types of data mining tasks (classification, clustering, association), and the role of data mining in knowledge discovery.
Data Exploration
Techniques for understanding data characteristics using descriptive statistics, visualizations, and summary reports to guide subsequent analysis steps.
Data Preprocessing
Data cleaning, handling missing values, normalization, encoding, and other preparation steps necessary to improve data quality and modeling outcomes.
Association Rule Mining
Techniques like Apriori and FP-Growth for discovering frequent itemsets and meaningful association rules from transaction data.
Decision Trees
Building and interpreting decision trees for classification tasks. Understanding the splitting criteria used to construct the tree, including Gini Index and Information Gain (Entropy).
Notes
This course builds essential foundations for machine learning, as many algorithms introduced here—like decision trees and association mining—are core components in advanced ML systems. More techniques and algorithms will be covered in the Machine Learning course next semester. Take the time to understand each method thoroughly and apply them through practical projects to prepare for what's ahead.